Parallel Hierarchical Radiosity On Cache-Coherent Multiprocessors

نویسندگان

  • Jim Richard
  • Jaswinder Pal Singh
  • James Thurber
چکیده

Computing radiosity is a computationally very expensive problem in computer graphics. Recent hierarchical methods have greatly speeded up the computation of first diffuse and now also specular radiosity. We present a parallel algorithm for computing both diffuse and specular radiosity together, and examine its performance in detail on cache-coherent shared address space multiprocessors. We compare this with an earlier implementation of parallel diffuse-only radiosity computation, which has very different characteristics. The algorithm is both irregular and highly unpredictable. Despite this, by taking advantage of the cache-coherent shared address space and using distributed dynamic task queues for load balancing, we obtain speedups of 26.7 on a 32-processor machine with distributed memory, with no need for explicit data management, and 14.2 on a 16-processor machine with centralized memory. Both the diffuse and the specular+diffuse programs achieve very good load balance. Because the specular program must process many more source patches for each destination patch than does the diffuse program, it has greater memory stall time; however the memory access overheads do not increase with the number of processes or lower the speedups attained. Detailed performance debugging was used to identify and understand bottlenecks in the programs, which mostly have to do with contention for locks that protect shared data structures, albeit in a very fine-grained way. We conclude that moderate-scale shared address space multiprocessors provide a successful platform for speeding up hierarchical radiosity computations, both diffuse and specular.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Parallelization for Non-cache Coherent Multiprocessors

Although much work has been done on parallelizing compilers for cache coherent shared memory multiprocessors and message-passing multiprocessors, there is relatively little research on parallelizing compilers for noncache coherent multiprocessors with global address space. In this paper, we present a preliminary study on automatic parallelization for the Cray T3D, a commercial scalable machine ...

متن کامل

Load Balancing and Data Locality in Adaptive Hierarchical N-body Methods: Barnes-Hut, Fast Multipole, and Radiosity

Hierarchical N-body methods, which are based on a fundamental insight into the nature of many physical processes, are increasingly being used to solve large-scale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for long-...

متن کامل

Compiler Techniques for Software Prefetching on Cache-Coherent Shared-Memory Multiprocessors

This document describes a set of new techniques for improving the eeciency of compiler-directed software prefetching for parallel Fortran programs running on cache-coherent DSM (distributed shared memory) multiprocessors. The key component used in this scheme is a data ow framework that exploits information about array access patterns and about the cache coherence protocol to predict at compile...

متن کامل

Load Balancing and Data locality in Adaptive Hierarchical N-Body Methods: Barnes-Hut, Fast Multipole, and Rasiosity

Hierarchical N-body methods, which are based on a fundamental insight into the nature of many physical processes, are increasingly being used to solve large-scale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for long-...

متن کامل

Software Caching on Cache-Coherent Multiprocessors

Programmers have always been concerned with data distribution and remote memory access costs on shared-memory multiprocessors that lack coherent caches, like the BBN Butterry. Recently memory latency has become an important issue on cache-coherent multiprocessors, where dramatic improvements in microprocessor performance have increased the relative cost of cache misses and coherency transaction...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007